Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments

نویسندگان

Edward Gibson

Steve Piantadosi

Kristina Fedorenko

چکیده

The prevalent method in theoretical syntax and semantics research involves obtaining a judgment of the acceptability of a sentence ⁄meaning pair, typically by just the author of the paper, sometimes with feedback from colleagues. The weakness of the traditional non-quantitative single-sentence ⁄ single-participant methodology, along with the existence of cognitive and social biases, has the unwanted effect that claims in the syntax and semantics literature cannot be trusted. Even if most of the judgments in an arbitrary syntax ⁄ semantics paper can be substantiated with rigorous quantitative experiments, the existence of a small set of judgments that do not conform to the authors’ intuitions can have a large effect on the potential theories. Whereas it is clearly desirable to quantitatively evaluate all syntactic and semantic hypotheses, it has been time-consuming in the past to find a large pool of naı̈ve experimental participants for behavioral experiments. The advent of Amazon.com’s Mechanical Turk now makes this process very simple. Mechanical Turk is a marketplace interface that can be used for collecting behavioral data over the internet quickly and inexpensively. The cost of using an interface like Mechanical Turk is minimal, and the time that it takes for the results to be returned is very short. Many linguistic surveys can be completed within a day, at a cost of less than $50. In this paper, we provide detailed instructions for how to use our freely available software in order to (a) post-linguistic acceptability surveys to Mechanical Turk; and (b) extract and analyze the resulting data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Assessor Behavior in Crowdsourced Preference Judgments

We describe a pilot study using Amazon’s Mechanical Turk to collect preference judgments between pairs of full-page layouts including both search results and image results. Specifically, we analyze the behavior of assessors that participated in our study to identify some patterns that may be broadly indicative of unreliable assessments. We believe this analysis can inform future experimental de...

متن کامل

Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk

Manual evaluation of translation quality is generally thought to be excessively time consuming and expensive. We explore a fast and inexpensive way of doing it using Amazon’s Mechanical Turk to pay small sums to a large number of non-expert annotators. For $10 we redundantly recreate judgments from a WMT08 translation task. We find that when combined non-expert judgments have a high-level of ag...

متن کامل

A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory

Amazon's Mechanical Turk (AMT) is a Web application that provides instant access to thousands of potential participants for survey-based psychology experiments, such as the acceptability judgment task used extensively in syntactic theory. Because AMT is a Web-based system, syntacticians may worry that the move out of the experimenter-controlled environment of the laboratory and onto the user-co...

متن کامل

Crowdsourcing Music Similarity Judgments using Mechanical Turk

Collecting human judgments for music similarity evaluation has always been a difficult and time consuming task. This paper explores the viability of Amazon Mechanical Turk (MTurk) for collecting human judgments for audio music similarity evaluation tasks. We compared the similarity judgments collected from Evalutron6000 (E6K) and MTurk using the Music Information Retrieval Evaluation eXchange 2...

متن کامل

Columbia MVSO Image Sentiment Dataset

The Multilingual Visual Sentiment Ontology (MVSO) consists of 15,600 concepts in 12 different languages that are strongly related to emotions and sentiments expressed in images. These concepts are defined in the form of Adjective-Noun Pair (ANP), which are crawled and discovered from online image forum Flickr. In this work, we used Amazon Mechanical Turk as a crowd-sourcing platform to collect ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Language and Linguistics Compass

دوره 5 شماره

صفحات -

تاریخ انتشار 2011

Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments

نویسندگان

چکیده

منابع مشابه

An Analysis of Assessor Behavior in Crowdsourced Preference Judgments

Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk

A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory

Crowdsourcing Music Similarity Judgments using Mechanical Turk

Columbia MVSO Image Sentiment Dataset

عنوان ژورنال:

اشتراک گذاری